-
Notifications
You must be signed in to change notification settings - Fork 22
use the source field to match executors with their stats #15
base: master
Are you sure you want to change the base?
Conversation
|
Looks generally good to me, but I don't have personal experience with Mesos. Is the source (not sure what exactly that field is) unique enough between all frameworks, executors, and tasks to make this map assignment not have any unwanted collisions? https://github.com/evertrue/mesos_exporter/blob/master/main.go#L377 |
|
@TNorden to what are you setting I somewhat fear that this might break stats reporting for tasks spawned using Aurora. |
|
@StephanErb Singularity takes care of setting the source, it basically manages our deploys to mesos. Here's and example of what I see when I hit the two endpoints used to gather metrics from the slaves:
You can see that with the two examples sometimes |
|
Just checked with Aurora:
That looks similar to your example and would not be broken by your patch (so my gut feeling was wrong here). Still, I believe there might be some other problem here.
|
|
I'm confused – this is clearly an overwrite. Either we are dropping something or don't need the inner loop to begin with. |
|
For Aurora and Singularity each executor just runs one task, so the pull request kinda works. However it would not work for frameworks which start multiple tasks per executor. |
|
I'll try to summarize the issue:
The root cause (of why scraping is currently broken) is that the code on |
|
@StephanErb Thanks for the explanation! Does that mean that the exporter's metrics structure is fundamentally broken then if resource metrics like I have never touched Mesos, so I'm relying on other people to decide on the right solution here. I also know that @discordianfish was interested in replacing this Mesos exporter entirely with his own Mesos exporter: https://groups.google.com/forum/#!searchin/prometheus-developers/mesos$20johannes/prometheus-developers/8jVQlQIKo-M/gcN_P-S6AwAJ All the Mesos people here have to come together and decide what's the best way forward :) |
|
My 2 cents If metrics are provided by mesos per executor I makes sense to expose it the same way through prometheus. But I think it should be labeled by the executorId and not this source which I never really seen used anywhere. Even if a bit of lie I still think it should report the taskId(s) with the metrics and in most situations it would just work. Repeating the metric for each task of the executor with a different task label would make it work for both. Only caveats with that if you actually have >1 tasks somewhere and you do a sum or avg you need to do that by executor and not by task. |
This would make it incorrect to repeat the metric for each task as a sum isn't safe in all cases. |
|
So yeah, I think that https://github.com/mesosphere/mesos_exporter provides better metrics. I might have missed something and unfortunately I'm not using mesos anymore. |
we are running mesos 0.23.0 with HubSpot's Singularity Framework, for some reason there were a handful of executors where the task id did not line up with the statistics' source field. This fixed the problem for us.